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Abstract — The arbitrarily varying channel (AVC) is a channel 
model whose state is selected maliciously by an adversary. Fixed- 
blocklength coding assumes a worst-case bound on the adver- 
sary's capabilities, which leads to pessimistic results. This paper 
defines a variable-length perspective on this problem, for which 
achievable rates are shown that depend on the realized actions of 
the adversary. Specifically, rateless codes are constructed which 
require a limited amount of common randomness. These codes 
are constructed for two kinds of AVC models. In the first the 
channel state cannot depend on the channel input, and in the 
second it can. As a byproduct, the randomized coding capacity 
of the AVC with state depending on the transmitted codeword 
is found and shown to be achievable with a small amount of 
common randomness. The results for this model are proved using 
a randomized strategy based on list decoding. 
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Fig. 1. Rateless communication system. The encoder and decoder share 
a source of common randomness. A single bit of feedback is available 
every c channel uses for the decoder to terminate transmission. Some partial 
information about the channel state is available at the decoder every c channel 
uses in a causal fashion. 



I. Introduction 

Modern communication platforms such as sensor networks, 
wireless ad-hoc networks, and cognitive radio involve com- 
munication in environments that are difficult to model. This 
difficulty may stem from the cost of measuring channel 
characteristics, the behavior of other users, or the interaction 
of heterogeneous systems using the same resources. These 
complex systems may use extra resources such as feedback 
on a low-rate control channel or common randomness to 
overcome this channel uncertainty. We are interested in how 
such resources can be used to deal with interference that is 
difficult to model or which may depend on the transmitted 
codeword. 

Inspired by some of these challenges, we approach the 
problem from the perspective of variable-length coding over 
arbitrarily varying channels (AVCs). The AVC is an adversarial 
channel model in which the channel is governed by a time 
varying state controlled by a jammer who wishes to maximize 
the decoding error probability. For fixed-blocklength coding, 
the capacity is the worst-case over all allowable actions of the 
jammer. However, in some cases the worst-case may be unduly 
pessimistic. Correspondingly, we ask the following questions 
: can variable-length codes be developed for AVC models 
that adapt to the realized actions of the jammer? How much 
feedback and common randomness is needed to enable these 
codes? 
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In this paper we study randomized coding for two different 
models based on the AVC. In a randomized code the encoder 
and decoder have a shared source of common randomness 
unknown to the jammer. This common randomness acts as 
a shared key to mask the coding strategy from the jammer. 
The first model we study is the AVC under maximal error 
and randomized coding, in which the state sequence is chosen 
independently of the transmitted codeword. The second model 
is an AVC in which the jammer can choose the state sequence 
based on the transmitted codeword. This may be an appropriate 
model for a multi-hop network in which an internal node 
becomes compromised and tampers with transmitted packets. 
We call this situation an AVC with "nosy noise." Our first 
result is a formula for the randomized coding capacity of this 
AVC. Our proof uses results on list decoding for AVCs [3]- 
[5] with a partial derandomization technique used by Langberg 
[6]. 

The main focus of this paper is on the problem of rateless 
coding for these channels using limited common randomness 
and partial channel state information, as shown in Figure 
Q] Rateless codes were first proposed for erasure channels 
[7], [8] and compound channels [9], [10], and a general 
model is discussed in [11]. They are strategies that allow a 
single-bit feedback signal (often called an ACK/NACK for 
"acknowledge'V'not acknowledge") every c channel uses to 
terminate transmission based on the observed channel output 
y and channel state information. In our model, the partial state 
information takes the form of estimates of the average channel 
induced by the channel state s over "chunks" of size c. In 
practice this channel information may come from exogenous 
measurements or from training information in the forward link, 
as in [12]. 

We propose a model for partial state information at the 
decoder which consists of an estimate of the empirical chan- 
nel. We then provide partially derandomized rateless code 
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constructions for the two AVC models. These codes have 
fixed input type and are piecewise constant-composition, and 
for accurate partial state information can achieve rates close 
to the mutual information of a corresponding AVC. The 
derandomization for these codes comes from strategies of 
Ahlswede [13] and Langberg [6], 

Related work and context 

The arbitrarily varying channel was first studied in the 
seminal paper of Blackwell, Breiman, and Thomasian [14], 
who found a formula for the capacity under randomized coding 
and maximal error. Without randomized coding, the maximal- 
error problem is significantly harder [ 15]— [ 1 8] and is related 
to the the zero-error capacity [19]. The AVC model was 
extended to include constraints on the jammer by Hughes 
and Narayan [20] and Csiszar and Narayan [21]-[23]. For 
randomized coding, error exponents have also been studied 
[24]-[26]. 

Ahlswede's landmark paper [13] showed that the average 
error capacity under deterministic coding Cd is or equal 
to the randomized coding capacity C r . Randomized coding 
gives the same capacity under maximal and average error, but 
for deterministic coding under average error the capacity may 
be positive and strictly smaller than the randomized coding 
capacity when cost constraints are involved [22]. However, 
Ahlswede's technique can be used to show that only O(logn) 
bits of common randomness is needed to achieve C r (A) for 
AVCs with cost constraint A. 

In the "nosy noise" model, shown in Figure [3] has been 
discussed previously in the AVC literature, where it is some- 
times called the A*VC. For deterministic coding, knowing 
the message is the same as knowing the codeword, so the 
maximal error capacity is as the nosy noise capacity [27, 
Problem 2.6.21], In some cases the average error capacity 
is also the same [16]. The capacity under noiseless feedback 
was later found by Ahlswede [28]. To our knowledge, for 
cost-constrained AVCs the problem was not studied until 
Langberg [6] found the capacity for bit-flipping channels with 
randomized coding. Smith [29] has shown a computationally 
efficient construction using 0(n) bits of common randomness. 
Agarwal, Sahai and Mitter proposed a similar model with a 
distortion constraint [30], which is different than the AVC 
model considered here [5]. 

Our study of rateless codes is inspired by hybrid- ARQ [31] 
and recent work that has shown how zero-rate feedback can 
improve channel reliability [32]-[34]. In [12] the encoder and 
decoder use randomly placed training sequences to estimate 
the channel quality. Another inspiration was the paper of 
Draper et. al [35], which studies an AVC model where the 
entire state sequence given to the decoder as side information 
and single-bit feedback acts as an ACK7NACK to terminate 
decoding. Our coding schemes can be used to provide a 
component of the coding strategy of [12], which shows that 
the rates achievable by Shayevitz and Feder [36] for individual 
sequence channels are also achievable with zero-rate feedback. 

In the next section we describe the channel model and in 
Section [HI] we state the main contributions of this paper. The 



two derandomization strategies are discussed in Section IIVI 
where we also find the capacity of AVCs with "nosy noise." 
Sections [V] and [VT] contain our rateless code constructions 
for channels with input-independent and input-dependent state, 
respectively. 

II. Channel models and definitions 

We will model our time-varying channel by a set of 
channels W = {W^ylx, s) : s £ S} with finite input 
alphabet X, output alphabet y, and constrained state se- 
quence [21]. This is an arbitrarily varying channel (AVC) 
model. If x = (xx, x 2 , . . ■ , x n ), y = (yi, y 2 , • • • , y n ) and 
s = (si, S2, ■ ■ ■ , s n ) are length n vectors, the probability of 
observing the output y given the input x and state s over the 
AVC W without feedback is given by: 

n 

W(y\x,s) = l[W{y i \x i ,s i ) . (1) 

i=l 

In this paper the feedback is used only to terminate trans- 
mission, and we compare our achievable rates with those 
achievable without feedback (c.f. [11]). The interpretation 
of (Q~|i is that the channel state can change arbitrarily from 
time to time. The AVC is an adversarial model in which the 
state is controlled by a jammer who wishes to stymie the 
communication between the encoder and decoder. As we will 
see, the knowledge held by the adversary can be captured in 
the error criterion. 

One extension of this model is to introduce constraints on 
the input and state sequences [21]. For simplicity we will 
only assume constraints on the state. Let I : S — * M + 
be a cost function on the state set, where mm 8 l{s) = 
and max sE s l(s) = A* < oo. The cost of the vector s = 
(si,S2, . . . , s n ) is the sum of the cost on the elements: 

n 

l(s) =J2l(*i) ■ (2) 

i=l 

In some cases we will impose a total constraint A on the 
average cost, so that 

l(s) < nA . (3) 

If A > A* we say the state is unconstrained. We will define 
the set 

S n (A) ={s:l(s)< nA} (4) 

to be the set of sequences with average cost less than or equal 
to A. 

A. Point-to-point channel coding 

A (n, N) deterministic code C for the AVC W is a pair of 
maps (</>, ip) with <j) : [N] -► X n and ip : y n -► [N]. The rate 
of the code is n^ 1 log N. The decoding region for message i is 
Di = {y : ip(y) = i} ■ We can also write a deterministic code 
C as a set of pairs {(x(i),Di) : i € [N]} with the encoder 
4> and decoder if> defined implicitly. The error for message i 
and state sequence s S S n (A) is given by 

e(i,s) = l-W(A|x(i),s) . (5) 
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Fig. 2. An arbitrarily varying channel with randomized encoding. The 
encoder and decoder share a secret key in [K] that is unknown to the jammer. 
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Fig. 3. The nosy noise error model - the jammer knows the codeword <j>^ (i). 



A (n, N) randomized code C for the AVC W is random 
variable taking on values in the set of deterministic codes. 
It is written as a pair of random maps (<1>, >J/) where each 
realization is an (n, N) deterministic code. If ($, <J/) almost 
surely takes values in a set of K codes, then we call this an 
(n, N, K) randomized code. We can also think of an (n, N, K) 
randomized code as a family of codes {((pkjfpk) ■ k € [K]} 
indexed by a set of K keys, as shown in Figure [2] The key 
size of a randomized code ($, ^) is the entropy H(C) of 
the code. In the case where C is uniformly distributed on a 
set of K codes, the key size is simply log if. Note that the 
realization of the code is shared by the encoder and decoder, 
so the key is known by both parties. The rate of the code is 
R = rT 1 log N. The decoding region for message i under key 
k is Di t k — {y : ipk(y) = «}■ I n me case where the bound on 
K is not explicit or unspecified, we write the random decoding 
region for message i as D, = {y : ^(y) = i}- 

For a randomized code we require that the decoder error to 
be small for each message message averaged over key values. 
Randomization allows several different codewords to represent 
the same message. For maximal error, there are two cases to 
consider, depending on whether or not the state can depend 
on the actual codeword. 

The standard maximal error for a (n, N) randomized code 
over an AVC W with cost constraint A is given by 



max max . 

i seS™(A) 



[l-W(Pi\$(i),B) 



(6) 



where the expectation is over the randomized code ($, >J/). 
Here the variables Dj and correspond to the same 

realization of the key. The nosy maximal error for a (n, N) 



randomized code over an AVC W with cost constraint A is 
given by 

E[l-W(Di|*(*),J($(*)))] , (7) 



max max 



(A) 



where the expectation is over the randomized code ($, >J/). 
Again, the variables D^, and J($(z)) correspond to the 
same realization of the key. We call an AVC under the nosy 
maximal error criterion an AVC with nosy noise. Figure [3] 
shows the channel model under the nosy noise assumption. In 
the AVC with nosy noise, the jammer's strategies take the form 
of mappings J : X n — > S n (A) from the codeword vectors to 
state sequences. This is a more pessimistic assumption on the 
jammer's capabilities, since it assumes that it has noncausal 
access to the transmitted codeword. Under randomized coding 
we will show that from a capacity standpoint all that matters 
is whether the jammer has access to the current input symbol. 

A rate R is called achievable if for every e > there 
exists a sequence of (n, N) codes of rate R n > R — S 
whose probability of error (maximal or nosy) is at most e. 
Whether R is achievable will depend on the error criterion 
(maximal or nosy). For a given error criterion, the supremum 
of achievable rates is the capacity of the arbitrarily varying 
channel. We will write C r (A) for the randomized coding 
capacity under maximal error with constraint A, and CV(A) 
for the randomized coding capacity with nosy noise and state 
constraints. 

B. Information quantities 

For a fixed input distribution P(x) on X and channel 
V(y\x), we will use the notation / (P, V) to denote the mutual 
information between the input and output of the channel. For 
a finite or closed and convex set of channels V we use the 
shorthand 



I (P, V) = min / (P, V) 



(8) 



We define the following sets: 



(9) 



Q(A) = jo e V{S) : J2 Q( s ) l ( s ) < A | 

U(P, A) = | £7 S V{S\X) :^U{s\x)P{x)l{s) < a| . 

(10) 

For an AVC W = {W(y\x, s) : s € S} with state constraint 
A we define two sets of channels: 

Wstd(A) = lv(y\x)=Y j W(y\x,s)Q(s) : 

Q(s)€Q(A)j (11) 
W dcp (P, A) = J V(y\x) = W(y\x, s)U(s\x) : 



U(s\x) eU(P,A) 



(12) 
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We will suppress the explicit dependence on A. The set in 
( fTTT i is called the convex closure of W, and the set in dT2b is 
the row-convex closure of W. In earlier works Wdcp(P, A) is 
sometimes written as W [13]. 

Two information quantities of interest in randomized coding 
for AVCs are 



C st d(A)=max min / (P, V) 

P V-GW atd (A) 



CWA) 



max min / (P. V) 

p vew dcp (P,A) 



(13) 
(14) 



Csiszar and Narayan [21] showed that the randomized coding 
capacity under maximal error C r (A) is equal to C s td(A). In 
Theorem Q] we show that the randomized coding capacity 
under nosy noise C* r (A) is equal to Cd ep (A). 

C. Rateless codes 

In a rateless code, the decoder can choose to decode at 
different times based on its observation of the channel output. 
We assume that the decoder can inform the encoder that it 
has decoded in order to terminate transmission. To simplify 
the analysis, we consider rateless codes that operate in chunks 
of length c(n). For a vector z let z\ denote (zi, Z2, ■ ■ ■ , z r ), 
and z( mc ) denote the m-th chunk (2( m _i) c +i, • ■ ■ > z mc ). 

The key quantity is the time at which the decoder attempts 
to decode, which we will denote by Mc(n), i.e., decoding 
is attempted after M chunks. If this decoding time is appro- 
priately chosen, then the decoding is successful (with high 
probability); the corresponding empirical rate is given by 

1 



R c 



Mi 



■ log 2 N, 



(15) 



where N is the number of codewords in the codebook. 
Defining a rateless code involves not only a codebook, but also 
a rule according to which the decoder selects the appropriate 
decoding time M. In our considerations, the decoder performs 
this selection based on side information about the true channel 
state (and thus, about the actions of the adversary), which the 
decoder receives at the end of each chunk. 

More formally, we denote the partial side information 
(channel estimate) given to the decoder after the m-th chunk 
by V m , which takes values in a set V(c). We describe the side 
information model in Section III-DI A (c, N, K) randomized 
rateless code is set of maps {(Q m , T m , : m = 1, 2, . . .}: 



[N] x [K] -► X c (16) 
y mc x V(c) m x [K] -> {0, 1} (17) 
y mc x V(c) m x [K] -v [N] . (18) 



To encode chunk m, the encoding function $ m uses the 
message in [N] and key in [K] to choose a vector of c channel 
inputs. 

The decision function r m defines a random variable, called 
the decoding time M of the rateless code: 



M = 



-{m:r m (yr,Vr,fc) = l} 



(19) 



Let M = {M*,M* + 1, ...,M*} be the smallest interval 
containing the support of M. The set of possible (empirical) 



rates for the rateless code are given by {(mc) 1 log N : m € 
M}. 

We can define decoding regions for the rateless code at 
a decoding time M = M. Note that if M = M we 
have T]\f(yi Ic , V-f 7 , k) = 1. For message i, key k and side 
information vector V^ 1 we can define a decoding region: 

A,fc(V a ) = {yi :TAf(yi ,V X , fcj = 1, 



(20) 



The maximal and nosy noise error for a (c, N, K) rateless 
code at decoding time M = M are, respectively, 



max 

ie[N] 



1 K 



1 - w 



Mc 



D 



k=l 



(21) 



e(M, J,V™) 



max 

ie[JV] K 



1 K 

-Y 



k=\ 



1 - w 



Mt 



M\ 



$™(i,k),J M (i,$?(i,k)) 



Mi 



(22) 



Here J = (J u . . . , J M ) and J M ■ [N] x X Mc -> S Mc is the 
adversary's strategy. Note that in these error definitions we do 
not take the maximum over all s or J, because the rate and 
error at which we decode will depend on the realized state 
sequence, in contrast to the point-to-point AVC errors in (O 
and ©. 

Because we consider rateless codes with finite total block- 
length n, under some state sequences the decoder may never 
decide to decode. Intuitively, this is because the channel is too 
noisy. In order to quantify the performance of a rateless code, 
we must specify the set of state sequences for which the code 
will decode. 

D. Partial channel state information 

Suppose that during the m-th chunk of channel uses { (m — 
l)c + 1, . . . mc} the channel inputs were x( mc ) and the state 
was s( mc \ Under the maximal error criterion, we define the 
average channel under s during the m-th chunk by 



V m {y\x) 



1 



mc 

t=(m— l)c+l 



W{y\x,s t ) ■ 



Under the nosy noise criterion we define the average channel 
under x and s by 



V m (y\x) 



1 



mc 



v 1 ' t=(m-l)c+l 



W(y\x t , 8 t )l(x t = x) 



(23) 



A receiver with full side information would learn the channel 
V m explicitly. We consider instead the case where the receiver 
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is given a set V m after the m-th chunk, where V m is a subset 
of channels such that V m (y\x) S V m . 

We denote the set of possible values for V m by V(c). This 
is a collection of subsets of W s td(A) r\V c (y\X) for maximal 
error and of Wd cp (A) n T' c {y\X) for nosy noise. We will 
assume a polynomial upper bound on the size of V(c): 



|V(c)|<c* , 



(24) 



for some u < oo. 

We consider two models for V m : in the first the decoder 
gets an estimate the empirical cost of the true state sequence, 
and in the second the decoder gets an estimate of the mutual 
information induced by the true channel. For rateless codes 
under maximal error we will assume that the receiver gets an 
estimate A m such that the true cost 



1 



me 



C — ' 

f=(m-l)c+l 



(25) 



satisfies A m < A m < A m + e. The CSI set is then 

V m = \v(y\x) = -J2w(y\x,si) 
{ c i=i 



1(a) < l{s^) + ce 



(26) 



We call such CSI e-cost-consistent. 

For rateless codes under nosy maximal error, we will say a 
CSI sequence is ^-consistent for input P if 



/ (P, v m 



min / (P, V) < e 

vev m 



(27) 



Our rateless codes for nosy maximal error will assume the 
CSI sequence is e-consistent. 

In our rateless code constructions we use a threshold rule 
on the minimum mutual information of the channel consistent 

with the side information Vi, V2, Once the receiver decides 

to decode, it implements the decoding rule for the rateless 
code. The decoder for the codes in Section [V] is a maximum 
mutual information (MMI) decoder, and a natural question 
is whether the channel outputs can be used to decide the 
decoding time. One way to do this is for the decoder to restrict 
the side information set V m to those channels consistent with 
the output. 

III. Main results and contributions 
A. Point-to-point AVCs 

Our first main result is Theorem Q] which is a characteriza- 
tion of the capacity of the AVC with nosy noise. The proof is 
given in Section IIV-BI 

Theorem 1: Let W be an AVC with state cost function l(-) 
and cost constraint A. Then Cd ep (A) is the randomized coding 
capacity of the AVC with nosy noise: 



a (A) = C dep (A) . 



(28) 



Furthermore, for any e > 0, there exists an n sufficiently 
large such that the sequence of rate-key size pairs (R, K(n)) 



is achievable with nosy maximal error i r (n), where n 2 
K(n) < exp(ne) and 



< 



R = C dcp (A) - e 
i(n) < exp(—nE(e)) 



12nC dcp (A) log \y\ 
e^K(n) log K(n) 



(29) 
(30) 



where E(a) > for a > 0. 

This theorem is proved by first constructing list-decodable 
codes with constant list size for cost-constrained AVCs. 
These list-decodable codes can be combined with a message- 
authentication scheme due to Langberg [6] in Lemma|2] which 
shows that the a secret key can be used to disambiguate 
the list. Because W s td(A) C Wdep(A), in general we have 
Cdep(A) < C s td(A). In some cases equality can hold, as in 
the following example. 

Example 1 (Bit-flipping (mod-two adder) ): Consider 
an AVC with input alphabet X = {0, 1}, state alphabet 
S = {0, 1} and output alphabet y = {0, 1}, with 

y = x ffi s , 

where denotes addition modulo two. This is a "bit-flipping 
AVC" in which the jammer can flip the input (s = 1). We 
choose l(s) = s so that the state constraint A < 1/2 bounds 
the fraction of bits which can be flipped by the jammer. It has 
been shown [6], [21] that 

C std (A) = 1 - MA) 
C dcp (A) = 1 - h b (A) , 

where hb(t) = — tlogt— (1 — t) log(l — t) is the binary entropy 
function. In this case, we have C s td(A) = C dcp (A). Further- 
more, the capacity under randomized coding and maximal 
error C r (A) = C s t d (A) and the capacity under randomized 
coding and nosy noise is CV(A) = C dop (A). 

Although for this bit-flipping example the two max-min 
expressions have the same value, this is not the case for general 
AVCs. In the previous example the addition was taken over 
the finite field F2. If we instead take the addition over the 
integers the two quantities are different. 

Example 2 (Real adder ): Consider an AVC with input al- 
phabet X = {0,1}, state alphabet S = {0, 1} and output 
alphabet y = {0,1,2}, with 

y = x + s . 

We choose l(s) = s so that the constraint A on the jammer 
bounds the weight of its input. For this channel, if A > 1/2 
Csiszar and Narayan [21] showed that C r (A) = 1/2 and is 
achieved with P = (1/2, 1/2). However, in the case of nosy 
noise the capacity is lower when A > 1/2 because the jammer 
can see the codeword, it can selectively set the output to be 1 
if P = (1/2, 1/2). We have [5]: 



Cdep(A) = h b 



Thus we can see that C r (A) = C dcp (A) < 1/2. 
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Fig. 4. Decoding rate versus time in a rateless code. The empirical mutual 
information corresponding to the AVC with the true cost (solid line) varies, 
and the ^-consistent channel estimates (dashed line) can track it. Once the 
channel estimates cross the decoding threshold (dotted line), the receiver 
terminates transmission and tries to decode. 



B. Rateless coding 

Theorems [2] and [3] provide achievable strategies for rate- 
less coding over channels with input-independent and input- 
dependent state, respectively. The proofs of these theorem are 
given in Section [V-CI and Section |VI-CI To state our results in 
a way that makes the tradeoff between error probability and 
blocklength clearer, we will assume 



c(n) = n 1 / 4 
M*{n) = n/c(n) = n 3/4 



(31) 
(32) 



For maximum and minimum rates i? max and i?,„i n the number 



mm 



of messages is N(n) = exp(ni? m i n ) and M* — -5- 

Theorem 2: Let W be an AVC with state cost function 
?(■). Fix e > 0, R min > 0, and input type P e V{X) with 
mirtjc P(x) > 0. Then there is an n sufficiently large such 
that for all n > n there exists a (c(n), exp(ni? m j n ), K(n)) 
randomized rateless code with K(n) jn — > oo whose decoding 
time satisfies 



M = min 

M,<M<M* 



nR-n 



Mc 



< I 



.9(e) \ , (33) 



where j(e) ^ as e ^ 0. The maximal error of the code at 
this decoding time satisfies 



e(s,Vf 



Mi 



o 



K(n) 



for state sequences s and e-cost-consistent CSI Vf 1 . 

This theorem says that if the CSI estimates the state cost 
in each chunk to within e, then the decoder will terminate 
transmission as soon as the mutual information of the channel 



exceeds the empirical rate 



nR„ 



Mc 



This is illustrated in Figure 



|4] The solid line represents the mutual information of the 
AVC corresponding to the true state cost l(sf Ic ), whichs is 
the worst-case over all state sequences whose cost is less 
that or equal to the true cost. The dashed line represents the 
mutual information of corresponding to the estimated cost. The 



dotted line is the empirical rate, so once the estimate crosses 
the threshold then the decoder will decode. Furthermore, the 
error decays as 0(n/K(n)). The codebook is constructed by 
taking a fully randomized constant composition code that is 
good for an AVC, manipulating it into a rateless code with 
the desired properties, and reducing the common randomness 
using Lemma Q] 

Theorem 3: Let W be an AVC with state cost function 
l(-). Fix R min > 0, e > 0, input type P e V{X) with 
min x P(x) > 0. Then there is an no sufficiently large such 
that for all n > no, there exists a (c(n), exp(ni2 m i n ), K(n)) 
rateless code whose decoding time satisfies 



M 



mm 

M* <M<M* 



M 



nR„ 



Mc 



< 



1 

M 



M 

E 

m— 1 



I{P,Vrr. 



2e 



where V m is the average channel in (|23l l. The nosy maximal 
error at this decoding time satisfies 



e(J,V™)<0 



eVK log K 



for K(n) = 0(exp(c)), state sequences s and e-consistent 
side information given by (|27| |. 

The theorem says that there exists a rateless code which 
can be decoded as soon as the empirical mutual information 
c 2~2m=i I (P> Vm) i s enough to sustain the nR m i n bits for the 
message, assuming the side information is e-consistent. This 
threshold is sufficient to guarantee decoding error probability 
that decays like l/\/KlogK for an AVC with nosy noise. 

In this code, the decoder decodes each chunk of c channel 
into a list of possible messages. As more chunks are received, 
the list size shrinks and the decoding time M is chosen to 
guarantee that the list size is bounded by a constant. Lemma |2] 
shows that this code can be used as part of a randomized code 
in which the secret key disambiguates the list at the decoder. 

Example 3 (Bit-flipping (mod-two adder)): Consider the 
mod-two additive AVC described in Example Q] on page [5] 
where the partial side information V m as an estimate A m 
of the empirical Hamming weight of the state sequence 
s (roc) jjjg rece j ver tracks the empirical weight of the state 
sequence to compute an estimate Am of the crossover 
probability. Theorems |2] and [3] both give rateless codes 
that can decode as soon as the estimated empirical mutual 
information Mc(l — /i&(Am)) exceeds the size of the 
message (logA^ bits). As i? m i n can be as small as we like, 
these codes can work for empirical state sequences with 
Hamming weight arbitrarily close to 1/2. The realized rate 
is within e of 1 — /i/j(Am), but the two codes differ greatly 
in the dependence of the error probability on the amount of 
common randomness. When the bit-flips cannot depend on 
the transmitted codeword, the error decays with K~ l , and 
when they can it decays with (^J~K log K)^ 1 . 

a) Remarks on the example: For the bit-flipping example, 
the rates guaranteed by both theorems are close to the capacity 
of the AVC with the corresponding cost constraint. However, 
in general this may not be the case. Both coding schemes 
use a fixed input type P, which is is a common feature of 
rateless coding strategies [9], [12], [35] but may result in some 
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loss in rate [37] with respect to an input distribution chosen 
with knowledge of the empirical state distribution. It may be 
possible to adapt the channel input distribution, perhaps using 
ideas from universal prediction [38] but we leave that for future 
work. 

This scheme can also be used with more general settings for 
the parameters of the scheme, such as the chunk size. Finally, 
we can also consider the case where the side information is 
merely consistent. In this setting it is hard to quantify how 
close the rate at which we decode will be to the true channel, 
since there are no guarantees on the tightness of the channel 
estimates. 

IV. TWO PARTIAL DERANDOMIZATION TECHNIQUES 

In this paper we are interested in the tradeoffs between error 
probability and the amount of common randomness available 
to the encoder and decoder. In this section we will show how to 
exploit existing techniques partially derandomize the rateless 
code constructions in Theorem [2] and [3] 

The "elimination technique" is due to Ahlswede [13] and 
uses a key size of 0(n) bits to achieve exponential decay in 
the probability of error [24]. The amount of shared common 
randomness is on the same order as the data to be transmitted, 
reminiscent of Shannon's "one-time pad" [39] for cryptogra- 
phy. Lemma Q] applies this technique to the randomized codes 
of Hughes and Thomas [25], [26] and quantifies an achievable 
tradeoff between randomization and error decay. This may be 
useful in engineering applications in which sharing 0(n) bits 
of key to send 0(n) bits of data is unreasonable. We will use 
the result to bound the common randomness needed for the 
rateless codes considered in Section [V] 

A second derandomization procedure was suggested by 
Langberg [6] for what he called an "adversarial channel" (in 
the terminology of this paper, a binary bit-flipping AVC with 
nosy noise). The construction starts with a list-decodable code 
and creates large overlapping subsets of codewords for each 
key. These sub-codebooks should be large so that the number 
of messages is close to the rate of the list-decodable code, and 
the overlap should be large so that the jammer does not learn 
the key from seeing the codeword. The encoder chooses the 
codeword corresponding to message m in the sub-codebook 
given by key k. The decoder first uses the list-decoder to find 
a list of L candidate codewords. By exploiting a combinatorial 
construction due to Erdos, Frankl, and Fiiredi [40], the sub- 
codebook structure can be chosen so that with high probability, 
only one of the codewords in the list at the decoder is in the 
sub-codebook corresponding to k. 



A. Derandomization for AVCs with maximal error : "elimina- 
tion " 

Lemma 1 (Elimination technique [13]): Let J be a positive 
integer and let C be an (n, N, J) randomized code with N = 
exp(ni?) whose expected maximal error satisfies 

max maxEc [s(i, s)l < S(n) , 

sGS"(A) i 



for an AVC W with cost function l(-) and cost constraint A. 
Then for all \i satisfying: 

A ilog<5(n)- 1 -M/i)log2> ^0Rlog2 + log|S|) , 

A 

where hb{n) is the binary entropy function, with probability 
exponentially small in n, the (n, N, K) randomized code 
uniformly distributed on K iid copies from C will have with 
maximal probability of error less than [i. 

The proof follows directly from the arguments in [13] and is 
omitted. In particular, if the there is a sequence of randomized 
codes whose errors decay exponentially: 

S(n) < exp(— an) , 

then a little algebra shows that we can choose the key size 
K(n) and the error fi to satisfy 

n 

K(nj ' 

for some £ > 0. In particular, the code of Hughes and Thomas 
[25], [26] has exponentially decaying error probability, so 
Lemma Q] shows that the randomized coding capacity C r (A) 
is achievable with common randomness K(n) polynomial in 
n, which corresponds to O(logn) bits. 

B. Derandomization for AVCs with nosy noise : message 
authentication 

In [5] it is shown that for any e > and P E ViX) with 
max x P(x) > 0, for n sufficiently large there exists a list- 
decodable code with codewords of type P, rate 



R = min UP, V) 

vew dop (P,A) 



list size 



and error 



L < 



6 log \y\ 



+ i , 



(34) 



(35) 



el < exp(-nE(e)) , 

where E(e±) > 0. The argument is based on those of 
Ahlswede [3], [4]. 

For AVCs with nosy noise, the state can depend on the trans- 
mitted codeword. By combining these list-decodable codes 
with a message authentication scheme used by Langberg [6], 
we can construct randomized codes for this channel with 
limited common randomness. The relationship between the 
key size, list size, and error is given by the following Lemma. 

Lemma 2 (Message Authentication [6]): Let W be an AVC 
and suppose we are given an (n, N, L) deterministic list- 
decodable code and probability of error e. For key size 
K(n) where K(n) is a power of a prime there exists an 
(n, N/ y/K(n), K(n)) randomized code with nosy maximal 
error e(s) such that 

2ilogA^(n) 



maxe(s) < e + 



(36) 



y/K(n) log K{n) 
By choosing the appropriate input distribution we can obtain 

our first new result : a formula for the randomized coding 

capacity for the AVC under nosy noise. 



s 



Proof: [Proof of Theorem Q~) To show the converse, 
note that the jammer can choose a memoryless strategy 
U(s\x) € U(P,A). Choosing the worst U yields a discrete 
memoryless channel whose capacity is Cd cp (A), and therefore 
the randomized coding capacity for this channel is given by 
Cdcp(A). 

To show that rates below Cd cp (A) are achievable, we first 
fix K(n) and let P be the input distribution maximizing 
Cdep(A). We can use the previous lemma with our result 
on list-decodable codes to achieve the desired tradeoff. Using 
(|3~5| >. for any ei(n) > we can choose an (n, N(n), L) list- 
decodable codebook with codewords of type P such that 



L = 



6 log \y\ 



1 



ei(n) 

N(n) = J Lexp(n(C dc p(A) - El (n))) 



and error 



£l < exp(-n£(e 1 (n))) . 

We can use Lemma [2] to construct an 
(n, N(n) I \J K(n), K(n)) randomized code with error 
probability 

2L\ogN(n) 



£ < exp(— nE{e 1 (n))) + 
< exp(-n-B(ei(n))) + 



y/K^n) log K(n) 
12nC dcp (A) log |^| 



ei(n)V^(n) log K(n) 

The rate of this randomized code is 

1 , N(n) 
R=-\og- 



n y/K(n) 

= C dcp (A)- ei (n)-ilog^P^ • 
n L 

For any e > and K(n) < exp(ne) we can choose e\(n) 
small enough so that R = Cd cp (A) — e. ■ 



C. An open question: converses for common randomness 

Common randomness is an important resource for coding 
strategies for the AVC. The two strategies mentioned in this 
section show that it is sufficient to have common randomness 
of O(logn) bits to achieve the randomized coding capacity. 
It is not clear that randomness is necessary to achieve rates 
as high as the randomized coding capacity. Because the 
deterministic coding capacity question is notoriously difficult, 
it would be of interest to prove lower bounds on the common 
randomness needed to achieve the randomized coding capacity. 

V. Rateless coding with cost information under 

MAXIMAL ERROR 

We now prove Theorem[2]on rateless coding for AVCs under 
maximal error. We develop a coding strategy that chooses a 
decoding time based on information about the cost of the 
actual state sequence s. We assume the state sequence s is fixed 
and estimates l(s\ Ic ) are revealed to the decoder after Mc 
channel uses. The decoder picks the decoding time M such 
that the empirical rate is close to the mutual information of 



an AVC with cost constraint ^(s^). We use the construction 
of Hughes and Thomas [25] as a basis for constructing a ran- 
domized rateless code using a maximum mutual information 
(MMI) decoder with unbounded key size, and use Lemma Q] 
to partially derandomize the construction. 

In this section we will assume the CSI takes the form of 
and that it is e-cost-consistent. Define 



A 



M 



A 



M 



M J2 A ™ 

rn—1 

l M 



(37) 
(38) 



m— 1 



be the true and estimated cost for the state sequence s^ c . 
The number of possible values for A m is at most (c + l)' 5 ', 
which is an upper bound on the number of types on S with 
denominator c. Without loss of generality we can assume A m 
takes values in the same set as A m . 



A. The coding strategy 

Our scheme uses a fixed maximum blocklength n and we 
will express other parameters as functions of n. For a fixed 
minimum rate i? m ; n , input distribution P, and key size K{n) 
we will construct a randomized rateless code with chunk size 
c(n) = n 1 / 4 and decoding time M*(n) — n 3 / 4 (see (|3TT > 
and (l32l). We will also use a parameter i? m i n which is the 
minimum rate of the code, so N(n) — exp(ni? m ; n ). 

Algorithm I : Rateless coding for standard AVCs 

1) The encoder and decoder choose a key k £ [K(n)) using 
common randomness. The encoder chooses a message 
i E [N(n)] to transmit and maps it into a codeword 

2) For m — 1,2, ...,M* — 1 the encoder transmits 
x (mc)^ m t j le TO . m chunk and the decoder sets the 
feedback bit T m (y< m ~ 1)c , A™ -1 , fc) = 0. 

3) For m = M*,...,M* = n/c, if 
T,„_i(y[ m 1 , A™ -1 ,fc) = 0, the encoder transmits 
x( mc '(i, fc) in channel uses (m — l)c + 1, (m — l)c + 
2, . . . , mc. 

4) The decoder receives channel outputs y( mc ) and an 
estimate A m of the state cost in the m-th chunk. Define 
the decision function r m (y™ c , A™, fc) by 



1. l^</(P,W st d(A m ) 



(39) 



Where A m is given by d38l ). If r m (-) = 1 then the 
decoder attempts to decode the received sequence, sets 
i = ^ m (y™ c ,fc), and feeds back a 1 to terminate 
transmission. Otherwise, the decoder feeds back a and 
we return to step [3} to send chunk m+ 1. 

Our code relies on the existence of a set of codewords 
{x(i, k)} which, when truncated to blocklength mc, form a 
good randomized code for an AVC satisfying a given cost 
constraint. The key to our construction is that the condition 
checked by the decision function d39b is sufficient to guarantee 
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that the decoding error will be small. In order to facilitate the 
analysis of the coding strategy, define the rate Rm at time M: 

Mc 

B. Randomized codebook construction 

Our codebook will consist of codewords drawn uniformly 
from the set 



(t c (p))"/ c = r c (p) x %{p) x • • • %{p) 



(40) 



n/c times 



That is, the codewords are formed by concatenating constant- 
composition chunks of length c. 

Lemma 3 (Fully randomized rateless codebook): Let W 
be an AVC with cost function ?(•). For any 6 > 0, R m i n > 
and input distribution P E V(X) with min x P(a;) > 0, for 
sufficiently large blocklength n there exists a randomized 
rateless code with N(n) — exp(ni? min ) messages whose 
decoding time M satisfies ( f39l > and whose rate at M = M 
satisfies 



logiV 
Mc 



</ P,W f 



td 



i=l 



l(«0 " /(*) > (41) 



for all s and 5-cost-consistent partial state information se- 
quence Vi 1 , where /(<$) — > as (5 — > 0. The error at decoding 
time satisfies 



e(M,s,Vf) =0{exp{-Ik(S)Mc)) , 
where E 3 (S) > 0. 



(42) 



C. Proof of Theorem \2\ 

We are now ready to prove the Theorem [2] 

Proof: Fix e > 0, R min > and P E V(X). Choose n 
sufficiently large so that the codebook-valued random variable 
Cm* that is the randomized code from Lemma [3] satisfies 
(l42l with e = 5 under the conditions on the state and side 
information in (l26l i and (1331 . For each M, let Cm be the the 
codebook truncated to blocklength Mc. 

We can now draw K(n) codebooks sampled uniformly from 
Cm*- Since Cm* truncated to blocklength Mc is Cm, this 
sampling induces a sampling on Cm for each M. Each of 
these truncated codebooks has error probability exponentially 
small in Mc, so by Lemma Q] we can choose n sufficiently 
large and chunk size c(n) so that with probability going to 1, 
the error probability is at most 0(n/K(n)) for each of the 
truncated codes. Therefore a code satisfying the conditions of 
the Theorem exists. ■ 

D. An application to individual sequence channels 

One case in which the we can obtain <5-cost-consistent state 
information is in the scheme proposed by Eswaran et al. [12] 
for coding over a channel with individual state sequence. The 
codes from this section can be used as a component in that 
coding scheme, which is an iterated rateless coding strategy 
using zero-rate feedback and unlimited common randomness. 
In each iteration, the encoder and decoder use common 



randomness to select a rateless code and use randomized 
training positions to estimate the channel quality. The rateless 
code uses the channel estimates to pick a decoding time. 
One drawback of the scheme in [12] is that the amount of 
common randomness needed to choose the rateless code is 
very large. By using the rateless code constructed in Theorem 
|2] the amount of common randomness can be reduced and can 
be accommodated in the zero-rate feedback link. 

VI. Rateless coding for channels with 

INPUT-DEPENDENT STATE 

We now prove Theorem[3]on rateless coding for AVCs under 
nosy maximal error. The idea is to build rateless codes which 
are list-decodable with constant list size at the decoding time 
M. Lemma [2] can be used to with these list decodable codes 
to construct a randomized code with small key size. 

A. The coding strategy 

We explicitly use information about the output sequence 
y at the decoder together with the side information V r „. For 
8 > and distribution P E V(X), given the m-th chunk of 
channel outputs y( mc ) and the side information set V m , define 

(mc) 



V m (y^ c V) 



= <VeV m :d a 



T ylme) ,J2Pix)V(y\x)) <e 



where d max (v) is the total variational distance. Although 
V m (y( mc ),e) depends on P, in our construction P is fixed 
so we do not make this dependence explicit. 
Algorithm II : Rateless coding for "nosy noise" 

1) The encoder and decoder choose a key k E [K] using 
common randomness. The encoder chooses a message 
i E [N] to transmit and maps it into a codeword 

2) If T m _i(-) = 0, the encoder transmits x*""^ in channel 
uses (m — l)c + 1, (m — l)c + 2, . . . , mc. 

3) The decoder receives channel outputs y( mc ) and the 
channel state information set V m and calculates the set 
of possible channels V m (y^ mc \ 6). Define the decision 
function r m (y™ c , V™, k) as 



\ mc m * — ' V 



5) 



(43) 



If r m(') = 1 then the decoder attempts to decode the 
received sequence, sets i — ^ m (y™ c , k), and feeds back 
a 1 to terminate transmission. Otherwise, it feeds back 
a and we return to step [2]) for chunk m + 1. 

The rateless code developed in this section has codewords 
in (T C {P)) M , i.e. they have type P in each chunk. Once the 
decision threshold M is reached, the decoder list decodes the 
received codeword and produces a list of candidate message- 
key pairs. From Lemma [2] with high probability there will be 
only one message-key pair in the list consistent with the key 
used to encode the message. 
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B. List-decodable codes 

The codebook we use is again sampled from (T c (P)) n / c 
given in (|40| i. In Lemma[4]we show that a codebook consisting 
of all sequences in T C (P) can be used as a list-decodable 
code with a list size that depends on a channel estimate at the 
decoder. This list size is exponential in c. Therefore (T C (P)) M 
can also be list-decoded using the channel estimates with list 
size exponential in Mc, The decoding condition (l43l can be 
used to bound the list size at decoding. The final step is to 
sample codewords from (T c (P)) n / c . The subsampling ensures 
constant bound on the list size for all decoding times. 

Lemma 4: Let W be an AVC. For any S > and £ > 0, 
P G V(X) with miiLj, P(x) > there is a c sufficiently large 
and a function such that for any V G V(c) the set T C (P) 

is a list-decodable code of blocklength c with N messages and 
list size L(V) for the AVC W under nosy maximal error with 



N = \T c (P)\>exp(c(H(X)-0) 



L(yJ,V)<exp 



c| max H(X\Y) + t; 



(44) 



and error 



E L < exp(-c • £i(0) , 



where H(X) is calculated with respect to the distribution P(x) 
and for V G V(yf , (5) the conditional entropy is with 

respect to the distribution P(x)V(y|x), and £a(£) > for 
£>0. 

With the previous lemma as a basic building block, we can 
create nested list-decodable codes where c is chosen to be 
large enough to satisfy the conditions of Lemma H] 

Lemma 5 (Concatenated exponential list codes): Let VV 
be an AVC. For any 6 > and £ > 0, P G V(X) 
with miriz P(x) > 0, there is a c sufficiently large such 
that the set {T C {P)) M is an list-decodable code with 
blocklength Mc, Nm messages and list size L(y^ Ic , V± ) for 
V? 1 = (Vi, V 2 , . • • , V M ) G V(c) m , where 



Nm > exp (Mc(fl-(X)-O) 



and 



< 



exp [ c f V max P (X m |y m ) + M£] j 

V u^i veVm(yCmc),4) // 



and maximal probability of error 



e L <Mexp(-cP 2 (£)) , 



(45) 



(46) 



where H(X) is calculated with respect to the distribution P(x) 
and for a channel V G V m {y^ mc \5) the conditional entropy 
P(X|Y) is with respect to the distribution P(x)V(y\x), and 
E 2 (0 > 0. 

Our codebook is constructed by sampling codewords from 
the codebook (T C (P))"/ C = (T C (P)) M * . Truncating this set 
to blocklength Mc gives (7~ C (P)) M . We want to show that 
for each M the sampled codewords can be used in a list 
decodable code with constant list size L. We can define for 



each truncation M, output sequence y^ /c , and side information 
sequence (Vi, . . . , Vm) a "decoding bin" 

S(M,yf c ,Vf) CX MC , 

which is the list given by the code in Lemma [5] The size of 
each bin can be upper bounded by d45b : 

|B(M, y^,^ 
< exp c 



(V max H(X m \Y m ) + Mt[) j 



Lemma 6 (Constant list size): Let W be an AVC with cost 
function ?(•). For any e > 0, P G V(X) with min^ P(x) > 
0, for sufficiently large blocklength n there exists a set of 
N(n) = exp(nP min ) codewords {x(j) : j G [N]} C 
(T C (P))"/ C such that for any CSI sequence (Vi, V 2 , • • • , Vm« ) 
and channel output y with decoding time M given by (l43l >. the 
truncated codebook {x^^j) : j G [N]} is an list decodable 
code with list size L satisfying 



L > 



12 log \y\ 



and maximal probability of decoding error 

e L (M) < Mexp(-cP(e)) , 

where E(e) > 0. 

C. Proof of Theorem \3\ 

Proof: We will use the codebook from Lemma [6] Since 
the set of messages of fixed size N, we use the construction of 
Lemma |2 This makes the code, when decoded at after M = 
M chunks, an (Mc, exp(nP m j n )/ y/K(n), K(n)) randomized 
code with probability of error 

i(M,s) < Mexp(-cE(e)) + 2 j^°*» . 

V -K log A 

Then we can use choose L = 12(log |iV|)/e to get 

VM , w », 24nP min log|y| 

e(M,s) < Mexp(-cP(e)) + 7 =^ — . 

ev a log K 

Finally, we must show that loss in rate is small, assuming e- 
consistent state information. But this follows because by d27l ). 
for all m 

I (P,V m ) - I (P,V m ) <e . 

Therefore the average of mutual informations in d43l ) is at most 
e smaller than the averages with the true channels and hence 
we get the bound on the decoding time. ■ 

VII. Discussion 

In this paper we constructed rateless codes for two different 
channel models with time varying state based on arbitrarily 
varying channels. In the first model, the state cannot depend 
on the transmitted codeword, and in the second model it can. 
By adapting previously proposed derandomization strategies, 
we showed that a sublinear amount of common randomness 
is sufficient. The first approach [13] subsamples a randomized 
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code and the second [6] is based on list decoding. The latter 
strategy may interesting from a practical standpoint given 
recent attention to list decoding with soft information [41]. The 
common randomness needed for our codes can be established 
via a zero-rate feedback link, which means that a secure 
control channel of small rate is all that is required to enable 
reliable communication in these situations. In particular, we 
can partially derandomize the construction proposed in [12] for 
communicating over channels with individual state sequences. 

We also found the capacity CV(A) for AVCs with "nosy 
noise." For these channels the jammer has knowledge of 
the transmitted codeword and we showed the randomized 
coding capacity CV(A) is equal to Cd cp (A). Although in some 
examples C r (A) may equal the capacity under maximal error 
C r (A), in general it is smaller. It is interesting to note that 
the jammer's worst strategy for nosy noise is to make a 
"memoryless attack" on the input by choosing the state St 
according the the minimizing conditional distribution U(s\xt) 
in dT4l >. In constrast, if the jammer is given strictly causal 
knowledge of the input sequence, Blackwell et al. [14] showed 
that the capacity is given by C s td, which is also the capacity 
when the jammer has no knowledge of the input sequence. 
Thus from the jammer's perspective, causal information about 
x is as good as no knowledge, and full knowledge is as good 
as knowledge of the current input. 

One interesting model for these point-to-point channels that 
we did not address is the case where the jammer has noisy 
access to the transmitted codeword. This can happen, for 
example, when the jammer is eavesdropping on a wireless 
multihop channel. Our derandomization strategies are tailored 
to the extreme ends of our channel model, where the jammer 
has no knowledge or full knowledge. A unified coding scheme 
that achieves capacity for a range of assumptions on the jam- 
mer's knowledge may help unify the two approaches. Finally, 
although the results in this paper are for finite alphabets, 
extensions to continuous alphabets and the Gaussian AVC 
setting [20], [23], [42] should be possible using appropriate 
approximation techniques. An interesting rateless code using 
lattice constructions has been proposed by Erez et al. in [43], 
and it would be interesting to see if that approach can work 
for more robust channel models. 
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Appendix 

Technical proofs have been deferred to this appendix. 



A. Proof of Lemma \3\ 

Proof: Fix 6 > 0, R n 
for each M e M = {M*, . 



codebook Cm of blocklength Mc with rate 

/iPniin 

Rm = ~Mc~ ■ 



Let Am be defined by 



R M = I P,W std (A 



M 



(47) 



The distribution of the codebook Cm will be the same as 
the distribution of the codebook Cm* of blocklength M*c 
truncated to blocklength c. 

Standard randomized codebook. Fix M and let Am be a 
randomized codebook of A codewords drawn uniformly from 
the constant-composition set Tm c (P) with maximum mutual 
information (MMI) decoding. Choose A such that 

^ log A < I (p W std (A M )) - 6/2 . 

From Hughes and Thomas [25, Theorem 1] the following 
exponential error bound holds for all messages i and state 
sequences s <E S Mc with l(s) < (Mc)Am- 

^m(A m ,»,s) 



< exp ( -Mc [ E r ( — log A + 6/2, P, A M ) - 6/2 



(48) 



— <~,M 



The exponent E r {-) is positive as long as the first argument is 
smaller than / ^P, W s td(AM)^ ■ Therefore we have the same 
bound on the average error: 

1 A 

■j^26 M {A M ,i) < Cm ■ 

i=l 

Thinning. Let Bm be a random codebook formed selecting 
B codewords from Am H (T C (P)) M . That is, we keep B 
codewords which are piecewise constant-composition with 
composition P. We declare an encoding error if |Am H 
(T C (P)) M \ < B. We use a combinatorial bound from [12]: 

\%(P)\ M 



\Tmc(P)\ 



> exp(-Mlog(Mc)?7(P)) 



7m 



(49) 



> and P. We will prove that 
, M*} there exists a randomized 



where i](P) < oo is a positive constant. Since Am is formed 
by iid draws from Tm c (P), the event that codeword i from 
Am is also in (T C (P)) M is distributed according to a Bernoulli 
random variable with parameter at least 7m- The size of 
I Am C\(T c (P)) m | is therefore the sum of iid Bernoulli random 
variables and the chance of encoding error can be bounded 
using Sanov's Theorem [44]: 

F(\A M n(T c (P)) M \<B) 

<(A+l) 2 exp(-A-D(B/A || 7M )) • 

Choose B = j M *A. Then we can make the probability that 
I Am H (T C (P)) M \ < B as small as we like and much smaller 
than the decoding error bound. Furthermore, this bound holds 
for all M € A4. Therefore a sub-codebook of B piecewise 
constant-composition codewords exists with high probability. 
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The encoder using B m now operates as follows : it draws 
a realization of a codebook and declares an error if the 
realization contains fewer than B codewords. If there is no 
encoding error it transmits the i-th codeword in the codebook 
for message i £ [B]. The average error on the fraction 
B/A — 7 M „ °f preserved codewords can be at most A/B 
times the original average error: 



B Cm 
5m (Bat, i) < ' 



1 



B ^ 

»=i 



7 M * 



Permutation. We now form our random codebook Cm 
by taking the codebook induced by encoder using Bm and 
permuting the message index. The encoder using Cm takes 
a message i, randomly chosen permutation it on [B], and a 
codebook B from Bm and outputs the codeword 7r(i) from 
B. The maximal error for a message i in this codebook the 
same as average error of B^: 



5 M {C M ,i) = -^|^5m(Bm,7t(«)) 
1 B 



< 



B 

i- 

Cm 
7m* 



For each M we can construct a randomized codebook Cm as 
described above. 

Nesting. Now consider the codebook Cm* of blocklength 
n = M*c and set the size of the codebook to B to equal 
N(n) — exp(ni? m i n ). We must guarantee that the errors will 
still be small. Since B = j M ,A, the rate of the codebook 
A M * is 



Pm* 



M*c 



i N 

7m* 



If we truncate Cm* to blocklength Mc, the resulting random- 
ized code is identically distributed to Cm- The rate for the 
corresponding Am can be bounded using (@9), QTT i and d32] >: 



Pm 



1 N 

< R M + 2r?(P) 

< r m + MP) 



M* log(M*c) 
~M c 

-Rmax log U 



Ru 



,1/4 



Therefore we can choose n sufficiently large so that the gap 
between pm and Rm can be made smaller than 6/2, so pu < 
Rm +5/2. Therefore using the definition of Am in Wf\ and 
the fact that Am > Am we have 

p M <l(p,W std (A M ))-S/2 , 
and the exponent in (|48T > is positive. Now, for (s, {A m }) such 



that (HTt holds, the error can be bounded: 

e(M,s,{A m }) 
Cm 



< 



7m* 



< exp (-Mc (Er (r m - 5/2, P, A M ) - 5/2 

exp (2r?(P)M*log(M*c)) 
= 0(eM-E 3 {5)Mc)) . 



Rate loss. The last step is to compare I fP,W s td(AM) 
to the empirical mutual information induced by the true state 
sequence. By assumption, the partial CSI is (5-cost-consistent 
so by ([27), 



A M < A M < A M + 5 



Therefore 



(P, W std (A M )) - / (P W std (Am)) = 0(5 log^ 1 ) 



By the triangle inequality and 

I (P,W std (A M ))~ ] ^ = 0(5 log 5- 1 ) . 
mc 

This proves (HTt . ■ 

B. Proof of Lemma 

Proof: Fix ^ > and 5 > 0. For an input distribution 
P(x) and channel V(y\x), let P'(y) be the marginal distribu- 
tion on y and V'(x\y) be the channel such that P(x)V(y\x) = 
P' (y)V' (x\y). Our decoder will output the set 

C(yl,V)= |J T^(y). 

The size of this set is, by a union bound, upper bounded by 
(144-b . The list coding results in [5] show that the probability 
that either the transmitted codeword x ^ >C(yf, V) or 

yli U 

vev(yi,s) 

is upper bounded by 

e L (V) <exp(-c-P L (e)) 

for some positive function Pl(£)- 

For c sufficiently large, the size of this list can be bounded 
by (|44| >. and the error probability is still bounded by 

e L (V) <exp(-c.£ L (0) • 

Thus, with probability exponential in c, this set will contain the 
transmitted x € Tp. Taking a union bound over the |V(c)| = 
c v possible values of the side information V shows that 

£l < exp(-c • E L (£) - v log c) , 

which gives the exponent Pi(C)- ■ 
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C. Proof of Lemma \5\ 

Proof: Choose c large enough to satisfy the conditions 
of Lemma [4] Our decoder will operate by list-decoding each 
chunk separately. Let L m be the list size guaranteed by Lemma 
|4] for the rn-th chunk. Then the corresponding upper bound 
in [] j L m is the desired the upper bound on L(V^). 
The probability of the list in each chunk not containing the 
corresponding transmitted chunk can be upper bounded: 

el < M exp(-cEi(£)) . 

As long as c grows faster than log M the decoding error will 
still decay exponentially with the chunk size c. ■ 



D. Proof of Lemma [6] 

Proof: Fix e > 0. We begin with the codebook 
(T C (P)) M . The truncation of this codebook to blocklength 
Mc for M G M. is the codebook defined in Lemma [5] Let 
{Zj : j € [JV]} be TV = exp(ni? m i n ) random variables 
distributed uniformly on the set (T C (P)) M . The decoder 
will operate in two steps: first it will decode the received 
sequence into the exponential size list B(M, yf Ic , Vf 1 ) given 
by the decoder of Lemma [5] and then it will output only 
those codewords in the list which match one of the sampled 
codewords {Zj}. Note that the decoder for Lemma[5]has error 
satisfying (1461 . 

For any S > and £ > we can choose c(n) sufficiently 
large so that for any fixed M, yf°, and Vf 1 € V(c) M 
that satisfy the conditions of the decoding rule in d43l > the 
probability that Zj lands in the list B(M, yf /c , Vf f ) output 
by the decoder of Lemma [5] is upper bounded: 



>(Zj£B(M,yr c ,Vn) 



< 



|£?(A/,yf c ,Vf)| 



exp(Mc(fl-pO-0) 

(M 
-c^/(p ! V„ l (y(" lc ),<5) 
m=l 

4 G . 



2Mc£, 



The random variable l(Zj € B(M, yf c , V^)) is Bernoulli 
with parameter smaller than G, so we can bound the prob- 
ability that L of the N codewords {Zj} land in the set 
B(M,yf c ,V^) using Sanov's Theorem [44]: 

lf>(Z, e B(M,yr c ,Vn) > L/N^j 

< (N + if exp (-ND (L/N \\ G)) . 
The exponent can be written as 



To deal with the (1 - L/JV)log((l - L/N)/(l - G)) term 
we use the inequality —(1 — a) log(l — a) < 2a (for small a) 



on the term (1 — L/N) log(l — L/N) and discard the small 
positive term -(1 - L/N) log(l - G): 

ND (L/N || G) > L log (Hf-} ~ N2(L/N) 



2L 



= L (-nR min +cJ2 I ( P , V m (y (mc) , 5)) - 2McA 

\ m=l J 

+ LlogL-2L. (50) 
From the rule in (l43l l. we know that (M, yf Ic , Vi ) satisfies: 



M 



nR min < cJ2 I [P,V m (y {mc \S) 



Met . (51) 

Substituting this into (|50l ) we see that 

ND (L/N || G) > L (Mce - 2Mc£) + L log L - 2L . 

For large enough n we have the bound (N + l) 2 < 2np + L. 
For large enough L, LlogL > 3L, so we can ignore those 
terms as well. This gives the following bound: 



< exp (—LMc (e - 2£) + 2nR min ) 



(52) 



The number of decoding bins B(M, yf Ic ,Vf) can be 
bounded by 

|{ J B(M,yf c ,Vf) : Vf € V(c) M , yf c € ^ /c }| 

<\y\ ] 



\Mc c Mv 



Therefore we can take a union bound over all the decoding 
bins in d52l to get an upper bound of 

exp (-LMc (e - 2£) + Mc log \y \ + Mvlogc + 2nR min ) . 

Since "~^" in < log |y| for all M > M*, we can choose n 
and c sufficiently large such that the upper bound becomes 

exp(-LMc(e-2£) +4Mc\og\y\) . 

If e > 2£ then we can choose 

4io g |y| 

to guarantee that subsampling will yield a good list-decodable 
code for all M S {M*,...,M*}. Choosing £ = e/3 and 
E(e) = E 2 (e/3), where E 2 (-) is from (06]l, yields the result. 
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